Rank | Count | Beginning |
---|---|---|
63809 | 26596 | El |
155192 | 21064 | La |
92191 | 19821 | En |
186535 | 9104 | Los |
228893 | 6023 | Por |
247712 | 5417 | Se |
173535 | 5379 | Las |
5213 | 4929 | A |
205911 | 4757 | No |
115148 | 4751 | Es |
292348 | 4539 | Y |
256163 | 4126 | Si |
217359 | 4087 | Para |
121585 | 3819 | Está |
127271 | 3387 | Este |
49656 | 3280 | De |
223483 | 3028 | Pero |
281999 | 2923 | Una |
281960 | 2724 | Un |
40263 | 2597 | Con |
270813 | 2593 | También |
259446 | 2342 | Sin |
266921 | 2202 | Su |
184392 | 2123 | Lo |
12912 | 2120 | Al |
22746 | 1972 | Aquí |
37439 | 1920 | Como |
55190 | 1896 | Desde |
7223 | 1779 | Además, |
130847 | 1566 | Esto |
In the next four subsections show the most frequent sentence beginnings consisting of N words, N=1, 2, 3, 4. In this subsection we start with N=1.
The most frequent word-N-grams at the beginning of sentences give some insight into sentence composition.
Especially for N=1, we only need a small corpus to identify the most frequent sentence beginnings.
select substring_index(sentence, ' ', 1) as beg, count(*) as cnt from sentences group by substring_index(sentence, ' ', 1) order by cnt desc limit 50;
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.1 Most Frequent Sentence Endings I
4.3.1.2 Most Frequent Sentence Endings II
4.3.1.3 Most Frequent Sentence Endings III
4.3.1.4 Most Frequent Sentence Endings IV